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Abstract.  We  consider  smooth  stochastic  programs  and  develop  a  discrete-time  optimal-control 
problem  for  adaptively  selecting  sample  sizes  in  a  class  of  algorithms  based  on  sample  average 
approximations  (SAA).  The  control  problem  aims  to  minimize  the  expected  computational  cost 
to  obtain  a  near-optimal  solution  of  a  stochastic  program  and  is  solved  approximately  using  dy¬ 
namic  programming.  The  optimal-control  problem  depends  on  unknown  parameters  such  as  rate 
of  convergence,  computational  cost  per  iteration,  and  sampling  error.  Hence,  we  implement  the 
approach  within  a  receding-horizon  framework  where  parameters  are  estimated  and  the  optimal- 
control  problem  is  solved  repeatedly  during  the  calculations  of  a  SAA  algorithm.  The  resulting 
sample-size  selection  policy  consistently  produces  near-optimal  solutions  in  short  computing  times 
as  compared  to  other  plausible  policies  in  several  numerical  examples. 

1  Introduction 

Stochastic  programs  that  aim  to  minimize  the  expectations  of  random  functions  are  rarely  solvable 
by  direct  application  of  standard  optimization  algorithms.  The  sample  average  approximations 
(SAA)  approach  is  a  well-known  framework  for  solving  such  difficult  problems  where  a  standard 
optimization  algorithm  is  applied  to  an  approximation  of  the  stochastic  program  obtained  by  re¬ 
placing  the  expectation  by  its  sample  average.  SAA  is  intuitive,  simple,  and  has  a  strong  theoretical 
foundation;  see  Chapter  5  of  [41]  for  a  summary  of  results  and  [20,  45,  19,  1]  for  examples  of  ap¬ 
plications.  However,  the  framework  suffers  from  a  main  difficulty:  what  is  an  appropriate  sample 
size?  A  large  sample  size  provides  good  accuracy  in  SAA,  but  results  in  a  high  computational  cost. 
A  small  sample  size  is  computationally  inexpensive,  but  gives  poor  accuracy  as  the  sample  average 
only  coarsely  approximates  the  expectation.  It  is  often  difficult  in  practice  to  select  a  suitable 
sample  size  that  balances  accuracy  and  computational  cost  without  extensive  trial  and  error. 

There  is  empirical  evidence  that  a  variable  sample  size  during  the  calculations  of  SAA  may 
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reduce  the  computing  time  compared  to  a  fixed  sample-size  policy  [42,  13,  12,  2,  34,  29,  3,  26]. 
This  is  often  caused  by  the  fact  that  substantial  objective  function  improvements  can  be  achieved 
with  small  sample  sizes  in  the  early  stages  of  the  calculations.  In  addition,  convergence  of  iterates 
to  optimal  and  stationary  solutions  can  typically  only  be  ensured  if  the  sample  size  is  increased  to 
infinity,  see,  e.g.,  [43].  There  is  also  ample  empirical  evidence  from  other  fields  such  as  semi-infinite 
programming  [9,  35],  minimax  optimization  [48,  30],  and  optimal  control  [39,  4,  27]  that  adaptive 
precision-adjustment  schemes  may  reduce  computing  times. 

It  is  extremely  difficult  for  a  user  to  select  not  only  one,  but  multiple  sample  sizes  that  overall 
balance  computational  cost  and  accuracy.  Clearly,  the  number  of  possible  sample  sizes  is  infinite 
and  the  interaction  between  different  stages  of  the  calculations  complicates  the  matter.  This  paper 
addresses  the  issue  of  how  to  best  vary  the  sample  size  in  SAA  so  that  a  near-optimal  solution  can 
be  obtain  in  short  computing  time.  We  develop  a  novel  approach  to  sample-size  selection  based  on 
discrete-time  optimal  control  and  closed-loop  feedback. 

While  the  issue  of  sample-size  selection  arises  in  all  applications  of  SAA,  this  paper  deals  with 
the  specific  case  of  smooth  stochastic  programs  where  the  sample  average  problems  are  approxi¬ 
mately  solved  by  standard  nonlinear  programming  algorithms.  Consequently,  we  assume  that  the 
sample  average  problems  are  smooth  and  their  gradients  can  be  computed  relatively  easily.  This 
case  arises  for  example  in  estimation  of  mixed  logit  models  [2] ,  search  theory  (see  Section  5) ,  and 
engineering  design  [33].  Important  models  such  as  two-stage  stochastic  programs  with  recourse 
[16],  conditional  Value-at-Risk  minimization  [31],  inventory  control  problems  [47],  and  complex  en¬ 
gineering  design  problems  [34]  involve  nonsmooth  random  functions  and  sample  average  problems. 
However,  recent  efforts  to  apply  smooth  approximations  of  nonsmooth  random  functions  appear 
promising  [1,  47].  Hence,  the  results  of  this  paper  may  also  be  applicable  in  such  contexts.  We  illus¬ 
trate  the  use  of  smooth  approximations  in  Section  5.  Applications  with  integer  restrictions  and/or 
functions  whose  gradients  may  not  exist  or  may  not  be  easily  available  are  beyond  the  scope  of  the 
paper;  see  [44,  37,  14,  6]  for  an  overview  of  that  area  of  research.  We  note  that  stochastic  programs 
may  also  be  solved  by  stochastic  approximations  [7,  18,  23]  and  stochastic  decomposition  [10,  15] 
under  suitable  assumptions.  However,  in  this  paper  we  focus  on  SAA. 

Existing  sample-size  selection  policies  for  SAA  aim  at  increasing  the  sample  size  sufficiently 
fast  such  that  the  algorithmic  improvement  (eventually)  dominates  the  sampling  error  leading  to 
convergence  to  optimal  or  stationary  solutions  [43,  42,  13,  2,  34,  26].  We  also  find  studies  of 
consistency  of  SAA  estimators  defined  by  variable  sample  sizes  [12]. 

The  issue  of  determining  a  computationally  efficient  sample-size  selection  policies  has  received 
much  less  attention  than  that  of  asymptotic  convergence.  The  recent  paper  [26]  defines  classes 
of  “optimal  sample  sizes”  that  best  balance,  in  some  asymptotic  sense,  sampling  error  and  rate 
of  convergence  of  the  optimization  algorithm  used  to  minimize  the  sample  average.  These  results 
provide  guidance  how  to  choose  sample  sizes,  but  still  require  the  user  to  select  parameters  that 
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specify  the  exact  sequence  of  sample  sizes  to  use.  We  show  empirically  in  this  paper  that  the  recom¬ 
mendations  of  [26]  may  be  poor  and  highly  sensitive  to  the  selection  of  parameters.  Consequently, 
we  find  a  need  for  sample-size  selection  policies  that  do  not  require  hard-to-select  user  specified 
parameters.  Such  policies  become  especially  important  when  stochastic  programs  are  solved  as 
part  of  decision-support  tools  operated  by  personnel  not  trained  in  mathematical  programming. 

In  [29],  we  eliminate  essentially  all  user  input  and  let  a  solution  of  an  auxiliary  nonlinear 
program  determine  the  sample  size  during  various  stages  of  the  calculations.  The  objective  function 
of  the  nonlinear  program  is  to  minimize  the  computational  cost  to  reach  a  near-optimal  solution. 
Typically,  the  nonlinear  program  depends  on  unknown  parameters,  but  computational  tests  indicate 
that  even  with  estimates  of  these  parameters  the  resulting  sample-size  selection  policy  provides 
reduction  in  computing  times  compared  to  an  alternative  policy.  We  find  similar  efforts  to  efficiently 
control  the  precision  of  function  (and  gradient)  evaluations  or  other  algorithm  parameters  in  the 
areas  of  semi-infinite  programming  [9],  interior-point  methods  [17],  interacting-particle  algorithms 
[21],  and  simulated  annealing  [22]. 

While  we  here  focus  on  obtaining  a  near-optimal  solution,  [3]  deals  with  how  to  efficiently 
estimate  the  quality  of  a  given  sequence  of  candidate  solutions.  The  paper  provides  rules  for 
selecting  variable  sample  sizes  for  the  estimation  at  each  iteration  of  the  procedure.  The  rules 
are  based  on  heuristically  minimizing  the  computational  cost  required  by  the  estimation  procedure 
before  a  termination  criterion  is  met.  The  computational  effort  to  generate  candidate  solutions  is 
not  considered.  The  procedure  requires  the  solution  of  the  sample  average  problems  to  optimality, 
which  may  be  computationally  costly  or,  possibly,  unattainable  in  finite  computing  time  in  the  case 
of  nonlinear  random  functions. 

In  this  paper,  we  view  a  SAA  algorithm  for  solving  a  stochastic  program  as  a  discrete-time 
dynamic  system  subject  to  random  disturbances  due  to  the  unknown  sample  averages.  A  simi¬ 
lar  perspective  is  taken  in  [17]  in  the  context  of  interior-point  methods  for  solving  deterministic 
nonlinear  programs  and  in  [21]  for  interacting-particle  algorithms.  Since  SAA  with  sample  average 
problems  solved  by  nonlinear  programming  algorithms  represent  a  substantial  departure  from  those 
contexts,  we  are  unable  to  build  on  those  studies. 

We  provide  control  inputs  to  the  discrete-time  dynamic  system  by  selecting  sample  sizes  for 
each  stage  of  the  calculations  as  well  as  the  duration  of  each  stage.  Our  goal  is  to  control  the  system 
such  that  the  expected  computing  time  to  reach  a  near-optimal  solution  of  the  stochastic  program 
is  minimized.  As  the  system  (i.e.,  the  algorithm)  is  highly  complex,  we  develop  a  surrogate  model 
of  the  behavior  of  the  system  that  can  be  used  for  real-time  control  of  the  system.  Behavioral 
models  for  algorithms  in  other  areas  of  optimization  are  discussed  in  [25,  38].  The  surrogate  model 
leads  to  a  surrogate  discrete-time  optimal-control  problem  that  we  solve  approximately  by  dynamic 
programming. 

While  the  auxiliary  nonlinear  program  for  sample-size  selection  in  [29]  is  deterministic  and 
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provides  no  feedback  about  observed  realizations  of  sample  averages  and  algorithmic  improvement, 
the  surrogate  optimal-control  problem  in  the  present  paper  accounts  for  the  inherent  uncertainty 
in  SAA  and  the  possibility  of  recourse  in  future  stages  of  the  calculations.  As  the  surrogate  model 
depends  on  unknown  parameters,  we  approximately  solve  the  optimal-control  problem  after  each 
stage  of  the  calculations  to  utilize  the  latest  estimates  of  those  parameters. 

We  obtain  the  surrogate  discrete-time  optimal-control  problem  through  relatively  straight¬ 
forward  derivations,  make  use  of  approximations,  and  estimate  several  unknown  parameters.  In 
spite  of  this,  we  show  in  numerical  examples  that  the  sample-size  selection  policy  generated  by  the 
optimal-control  problem  is  consistently  better  than  the  asymptotically  optimal  policy  of  [26]  and 
other  plausible  polices. 

Our  sample-size  selection  policy  does  not  include  hard-to-select  user  specified  parameters  that 
may  greatly  influence  computing  times.  Hence,  the  policy  is  well  suited  for  implementation  in 
automated  decision-support  tools  and  for  use  by  other  than  experts  in  numerical  optimization. 

In  section  2,  we  define  the  stochastic  program  considered  and  describe  the  sample-size  selection 
problem  as  a  discrete-time  optimal-control  problem.  The  optimal-control  problem  appears  to  be 
unsolvable  and  Section  3  defines  an  alternative,  surrogate  optimal-control  problem  that  is  tractable. 
The  surrogate  optimal-control  problem  depends  on  unknown  parameters  that  are  estimated  by 
procedures  described  in  Section  4.  Section  4  also  describes  the  full  algorithm  which  integrates 
the  surrogate  optimal-control  problem  and  the  parameter  estimation  procedures  within  a  receding- 
horizon  framework.  Section  5  gives  a  summary  of  numerical  results. 

2  Problem  Statements 

2.1  Stochastic  Optimization  Problem  and  Sample  Average  Approximations 

We  consider  the  probability  space  (H,.?-', IP),  with  H  C  IR^  and  T  <Z  2^  being  the  Borel  sigma 
algebra,  and  the  random  function  F  :  IR*^  x  H  ^  IR.  Let  the  expected  value  function  f  :  IR'^  ^  IR 
be  defined  by 

/(x)  :=lE[F(x,u;)],  (1) 

where  IE  denotes  the  expectation  with  respect  to  the  known  probability  distribution  P.  Moreover, 
we  define  the  problem 

P:  min/(x),  (2) 

x£X 

where  X  C  is  a  convex  compact  set.  We  assume  that  F{-,io)  is  continuous  on  X  for  P-almost 
every  to  G  0,  and  that  |F(x,cu)|  is  bounded  by  an  integrable  function  for  all  x  G  A  and  P-almost 
every  a;  G  H.  This  implies  that  /(•)  is  well-defined  and  continuous  on  X  (see  Theorem  7.43  in  [41]). 
Hence,  the  optimal  value  of  P,  denoted  /*,  is  defined  and  finite.  We  denote  the  set  of  optimal 
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solutions  of  P  by  X*  and  the  set  of  e-optimal  solutions  by  X*,  i.e.,  for  any  e  >  0 

x:  :=  {x  €  Xlf(x)  -  r  <  e}.  (3) 

For  general  probability  distributions  P,  we  are  unable  to  compute  f(x)  exactly.  Hence,  we 
approximate  it  using  the  random  sample  average  funetion  ^  P,  G  IN  :=  {1,2,3,...}, 

defined  by 

N 

fN{x)  :=  F{x,  ujj)/N,  (4) 

i=i 

where  wi,  W2, ...,  wtv  is  a  sample  of  size  N  consisting  of  independent  random  vectors  with  distribution 
P.  In  fN{x)  as  well  as  in  other  expressions  below,  we  suppress  the  dependence  on  the  sample  in 
the  notation.  Moreover,  we  denote  a  random  vector  and  its  realization  with  the  same  symbol.  The 
meaning  should  be  clear  from  the  context. 

Various  sample  sizes  give  rise  to  a  family  of  (random)  approximations  of  P.  Let  {PArjArgiN  be 
this  family,  where,  for  any  N  G  IN,  the  (random)  sample  average  problem  Pat  is  defined  by 

Pat:  min/Ar(x).  (5) 

xGX 

Since  /Af(-)  is  continuous  on  X  almost  surely,  the  minimum  value  of  Pat,  denoted  by  is  defined 
and  finite  almost  surely.  Let  X"^  be  the  set  of  optimal  solutions  of  Ptv- 

In  this  paper,  we  aim  to  approximately  solve  P  by  means  of  approximately  solving  a  sequence 
of  problems  of  the  form  Pat  with  varying,  well-selected  N.  We  assume  that  for  any  iV  G  P  there 
exists  a  suitable  algorithm  for  solving  Ptv  given  by  an  algorithm  map  :  X  ^  X .  While  we  state 
the  sample-size  control  problem  below  without  further  assumptions,  we  essentially  throughout  this 
paper  assume  that  the  algorithm  map  is  linearly  convergent  as  formally  stated  in  Section  3.  There 
may  be  many  algorithms  that  obtain  linear  convergence  for  a  given  /Ar(-)-  We  are  motivated  by 
situations  where  F(-,lo)  is  continuously  differentiable  for  IP-almost  every  u;  G  H.  In  such  situations 
the  algorithm  map  may  be  defined  by  one  iteration  of  the  projected  gradient  method  (see,  e.g.,  p. 
66  of  [28])  or  some  other  nonlinear  programming  algorithm  applied  to  Pat.  We  note  that  [26]  also 
considers  linearly  convergent  algorithm  maps. 

While  we  in  this  paper  focus  on  linearly  convergent  algorithm  maps,  the  methodology  is,  in 
principle,  also  applicable  to  superlinearly  convergent  algorithm  maps  as  a  linear  rate  provides  a 
conservative  estimate  of  the  progress  of  a  superlinearly  convergent  algorithm  map.  However,  it  is 
beyond  the  scope  of  the  paper  to  examine  this  aspect  further. 

It  is  well  known  that  under  the  stated  assumption  on  F(-,  ■)  and  independent  sampling,  fN^x) 
converges  to  /(x)  uniformly  on  X ,  as  N  ^  oo,  almost  surely;  see  for  example  Theorem  7.48  in  [41]. 
Now  suppose  that  we  apply  an  algorithm  map  An{-)  to  /7v(‘)  that  is  convergent  in  the  following 
sense:  for  all  N  and  any  sequence  {xjj^Q  generated  by  the  iteration  Xj+i  =  A]\f{xi),  the  sequence 
{xij^Q  tends  to  a  point  in  X"^  almost  surely.  Then,  for  any  e  >  0,  a  sufficiently  large  N  and  a 
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sufficiently  large  number  of  iterations  of  the  algorithm  map  result  in  a  solution  in  X*  almost  surely. 
Unfortunately,  this  simple  approach  has  several  drawbacks.  First,  if  e  is  relatively  close  to  zero, 
both  N  and  the  number  of  iterations  may  be  large  resulting  in  a  high  computational  cost.  Second, 
since  only  a  single  sample  is  used,  it  may  be  difficult  to  estimate  the  variability  in  and,  hence, 
to  estimate  the  quality  of  the  obtained  solution.  Third,  in  practice,  the  algorithm  map  may  only 
guarantee  convergence  when  starting  sufficiently  close  to  In  such  cases,  the  use  of  multiple 
samples  “randomize”  the  sequence  of  iterates  and  therefore  may  increase  the  chance  to  obtain  a 
good  local  minimum.  This  effect  is  not  present  when  we  use  a  single  sample. 

As  argued  above,  a  variable  sample  size  may  in  part  overcome  the  first  drawback  of  the  simple 
approach.  Hence,  we  consider  the  approximate  solution  of  a  sequence  of  problems  with 

typically  increasing  sample  sizes  N^-  While  we  could  have  let  the  sample  for  contain  the 

sample  for  Pat^.,  we  let  Fn^+i  be  independent  of  Pat^,  for  all  k.  This  construction  addresses  the 
second  and  third  drawbacks  discussed  above.  Hence,  we  consider  the  following  stagewise  approach 
where  at  stage  k  an  independent  sample  of  size  Nk  is  generated  from  P.  The  sample  of  a  stage 
is  independent  of  the  samples  of  previous  stages.  We  find  a  similar  stagewise  sampling  scheme  in 
[12].  After  the  sample  generation,  iterations  with  the  algorithm  map  are  carried  out  on 

P ATfc  using  the  generated  sample.  This  approach  is  described  next  in  a  conceptual  algorithm. 

Algorithm  1  (Conceptual  Algorithm  for  P). 

Data.  Optimality  tolerance  e  >  0;  initial  solution  rcg  G  X. 

Step  0.  Set  stage  counter  k  =  1. 

Step  1.  Determine  number  of  iterations  G  IN  and  sample  size  G  IN. 

Generate  an  independent  sample  of  size  N^. 

Step  2.  For  i  =  0  to  Ufc  —  1:  Compute  =  An^{x^)  using  the  sample  generated  in  Step  1. 

Step  3.  If  G  X*,  then  Stop.  Else,  set  Xq'^^  =  replace  /c  by  A:  +  1,  and  go  to  Step  1. 

In  view  of  the  discussion  above,  it  is  clear  that  given  a  particular  stage  k  and  a  convergent  algorithm 
map,  there  exists  an  N'  and  n!  such  that  if  >  N'  and  >  n' ,  then  Algorithm  1  stops  after  the 
k-th.  stage  almost  surely. 

We  note  that  Algorithm  1  resembles  the  classical  batching  approach  to  obtain  a  lower  bound  on 
the  optimal  value  of  a  two-stage  stochastic  program  with  recourse  [20].  In  that  case,  M  independent 
sample  average  problems  Ptv  with  a  fixed  N  are  solved  to  optimality.  In  the  present  context,  we 
do  not  assume  that  F(-,uj)  is  piecewise  linear  or  has  any  other  structure  that  allows  the  solution 
of  Pat  in  finite  time.  Moreover,  we  allow  a  variable  sample  size  and  warm-start  stages,  i.e., 
Xg"''^  =  x^^,  in  an  effort  to  reduce  the  computing  time  to  obtain  a  near-optimal  solution. 
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Clearly,  Step  3  of  Algorithm  1  is  not  implementable  as  X*  is  defined  in  terms  of  the  unknown 
f*  and  the  uncomputable  /(•).  We  discuss  this  further  in  Section  4.  Given  P  and  the  families 
{PatItvsin  and  {AAr(-)}7V6iN)  our  goal  is  to  determine  an  efficient  policy  for  selecting  the  number  of 
iterations  and  the  sample  size  in  Step  1  of  Algorithm  1.  Specifically,  we  would  like  to  find  a  policy 
that  approximately  minimizes  the  expected  computational  cost  required  in  Algorithm  1  to  reach  a 
near-optimal  solution.  We  refer  to  this  problem  as  the  sample-size  control  problem  and  formulate 
it  as  a  discrete-time  optimal-control  problem. 

2.2  Sample-Size  Control  Problem 

For  any  sample  of  size  G  IN  and  number  of  iterations  n,  let  A^{x)  denote  the  iterate  after  n 
iterations  of  the  algorithm  map  initialized  by  x.  That  is,  A^{x)  is  given  by  the  recursion 

A^(x)  =  X  and,  for  any  i  =  0, 1,  2, ...,  n  —  1, 

A^^\x)=Am{A^^{x)).  (6) 

We  consider  the  evolution  of  Algorithm  1  to  be  a  stationary  discrete-time  dynamic  system 
governed  by  the  dynamic  equation 

Xnk=  =  1,2,3,...,  (7) 

where  G  A  is  the  state  at  the  beginning  of  the  k-th  stage,  Uk  =  {N^.,  n^)  G  IN  x  (IN  U  {0})  is 

the  control  input  for  the  /c-th  stage,  and  x^^  =  Xq  is  the  initial  condition.  We  denote  the  random 
sample  of  stage  /c  by  4)^  =  {uji,u)2,  ...,u!^^).  Clearly,  for  any  k  G  IN,  x^^  is  unknown  prior  to  the 
realization  of  the  samples  u)^,  dP' ,  ...,  dP .  Hence,  dP  is  the  disturbance  induced  at  the  k-th  stage. 

We  define  the  feasible  set  of  controls  U{x)  as  follows:  If  x  G  X*,  then  U{x)  =  {(1,0)}. 
Otherwise,  U{x)  =  IN  x  IN.  Let  c  :  IN  x  (IN  U  {0})  ^  [0,oo)  be  the  computational  cost  of 
carrying  out  one  stage.  Specifically,  c{N,  n)  is  the  computational  cost  of  carrying  out  n  iterations 
of  algorithm  map  A]\f{-).  Also,  we  set  c(l,0)  =  0. 

Given  an  initial  solution  Xq  G  A,  we  seek  an  admissible  stationary  policy  ^  :  A  ^  IN  x  (INUjO}) 
with  /u(x^“^J  G  N(aJnfcl*^i)  £  A,  A:  G  IN,  that  minimizes  the  total  cost  function 

~  S 

J^(xj)  :=  limsupFl  (8) 

U=i 

subject  to  the  constraints  (7).  (In  (8)  we  slightly  abuse  notation  by  allowing  c(-,-)  to  take  a 
two-dimensional  vector  as  input  instead  of  two  scalar  values.)  Here,  E  denotes  expectation  with 
respect  to  the  disturbances  due  to  the  samples  d)^,  4)^,  ...,  4;^.  We  assume  that  the  cost  function 
c(-,-),  policy  ^(•),  and  algorithm  map  Ajv(-)  satisfy  sufficient  measurability  assumptions  so  that 
this  expectation  is  defined. 
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For  a  given  initial  solution  Xq  G  X,  we  define  the  sample-size  eontrol  problem 

SSCP:  infj^(xj),  (9) 

where  the  infimum  is  over  all  admissible  policies.  Conceptually,  the  solution  of  SSCP  provides  an 
optimal  policy  that  can  be  used  in  Step  1  of  Algorithm  1  to  determine  the  next  sample  size  and 
number  of  iterations. 

Clearly,  there  are  four  major  difficulties  with  solving  SSCP:  (i)  the  set  of  e-optimal  solu¬ 
tions  A*  is  typically  unknown,  (ii)  the  state  space  X  C  IR'^  is  continuous  and  potentially  large¬ 
dimensional,  (hi)  the  dynamic  equation  (7)  can  only  be  evaluated  by  computational  costly  calcu¬ 
lations,  and  (iv)  the  expectation  in  (8)  cannot  generally  be  evaluated  exactly.  In  the  next  section, 
we  present  a  control  scheme  based  on  a  surrogate  dynamic  model,  receding-horizon  optimization, 
and  parameter  estimation  that,  at  least  in  parts,  overcome  these  difficulties. 

3  Surrogate  Sample-Size  Control  Problem 

Instead  of  attempting  to  solve  SSCP,  we  construct  and  approximately  solve  a  surrogate  sample-size 
control  problem.  We  base  the  surrogate  problem  on  the  asymptotic  distributions  of  the  progress 
made  by  the  algorithm  map  given  a  particular  control,  which  we  derive  next. 

3.1  Asymptotic  Distributions  of  Progress  by  Algorithm  Map 

We  assume  that  the  algorithm  map  used  in  Algorithm  1  is  uniformly  linearly  convergent  as 

made  precise  in  the  following  assumption. 

Assumption  1  There  exists  a  9  £  {0, 1)  sueh  that 

fN{AN{x))  -  Jn  <  ^{fN{x)  -  In)  a.s.  (10) 


for  all  X  £  X  and  A  G  IN. 

When  applied  to  Pat,  with  F{-,uj)  being  continuously  differentiable  for  P-almost  every  a;  G  P, 
gradient  methods  based  on  feasible  directions  typically  satisfy  linear  rate  of  convergence  under 
standard  assumptions,  see,  e.g..  Theorem  1.3.18  in  [28].  Assumption  1  requires  that  there  exists  a 
uniform  rate  of  convergence  coefficient  that  is  valid  almost  surely.  This  holds,  for  instance,  when 
the  eigenvalues  of  the  Hessian  of  fN{x),  x  G  A,  A  G  IN,  are  positive  and  are  bounded  from  above 
and  away  from  zero  almost  surely.  We  find  a  similar  linear  rate  of  convergence  assumption  in  [26] . 

Given  a  sample  of  size  A,  we  consider  the  progress  towards  after  n  iterations  of  the  algorithm 

map.  It  follows  trivially  from  Assumption  1  and  optimality  of  ffj  that  for  any  x  G  A, 


/iv  <  fN{A'ff{x))  <  ffi}{x)  fff  0"(/Ar(x)  —  /i(r) 


a.s. 


(11) 


We  are  unable  to  derive  the  distribution  of  fN{A^{x)),  but  will  focus  on  its  asymptotic  distributions 
as  well  as  those  of  its  upper  and  lower  bounds  in  (11).  The  derivations  rely  on  the  following 
assumptions. 

Assumption  2  We  assume  that  lE[F(x,u;)^]  <  oo  for  all  x  ^  X. 

Assumption  3  There  exists  a  measurable  funetion  C  :  Q  ^  [0,oo)  sueh  that  lE[C'(a;)^]  <  oo  and 

\F{x,u!)  —  F{x,(jj)\  <  C{(jj)\\x  —  x'W  (12) 

for  all  x,x'  G  X  and  W-almost  every  uj  G  id. 

Below  we  need  the  following  notation.  Let  Y(x),x  G  X,  denote  normal  random  variables  with 
mean  zero,  variance  iT^(x)  :=  Var[F{x,ijj)],  and  covariance  C'ou[y(x),  y(x')]  :=  Cov[F{x,ijj),F{x' ,uj)] 
for  any  x,x'  G  X.  We  also  let  ^  denote  convergence  in  distribution. 

It  is  well-known  that  the  lower  bound  in  (11)  is  typically  “near”  f*  for  large  N  as  stated  next. 

Proposition  1  [fO]  Suppose  that  Assumptions  2  and  3  hold.  Then, 

N^^HfN-n^  inf  n^),  (13) 

x£X* 

as  N  ^  oo. 

Consequently,  if  there  is  a  unique  optimal  solution  x*  of  P,  i.e.,  X*  =  {x*},  then  the  lower  bound 
fff  on  /Ar(A)(f(x))  (see  (11))  is  approximately  normal  with  mean  f*  and  variance  a‘^{x*)/N  for 
large  N. 

We  now  turn  our  attention  to  the  upper  bound  on  fj^{Af^{x)).  We  present  two  results.  The 

first  one  is  an  asymptotic  result  as  N  ^  oo  for  a  given  n.  The  second  one  considers  the  situation 

when  both  N  and  n  increase  to  infinity.  Below  we  denote  a  normal  random  variable  with  mean  m 
and  variance  v  by  N{m,  v). 

Theorem  1  Suppose  that  Assumptions  1,  2,  and  3  hold  and  that  there  is  a  unique  optimal  solution 
X*  o/P,  i.e.,  X*  =  {x*}.  Then,  for  any  x  G  X  and  n  G  IN 

^N{t),Vn{x)),  (14) 

as  N  ^  oo,  where 

V^{x)  =  (1  -  r)V2(x*)  +  02^cj2(x)  +  2Cov{F{x*,uj),F{x,  a;))(l  -  0^)9^.  (15) 

Proof:  By  definition 

N^^VN{x)-r -0^{f{x)-n]  =  (1  -  r)iv'/2(/^  - /*)  (16) 

+  e^N^/^{Mx)-f{x)). 
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Since  P  has  a  unique  optimal  solution,  it  follows  from  Theorem  5.7  in  [41]  that 


jStI/2  (  fN{x)  -  f{x) 

\  In -r 


Y{x) 

Y{x*[ 


(17) 


as  N  ^  oo.  Then,  the  result  follows  after  application  of  the  continuous  mapping  theorem,  see,  e.g.. 
Theorem  29.2  in  [5].  □ 

In  view  of  Theorem  1,  we  see  that  the  upper  bound  on  fj^{A^{x))  is  approximately  normal 
with  mean  f*  +  9’^{f{x)  —  f*)  and  variance  Vn{x)/N  for  large  N .  If  we  relax  the  assumption  of  a 
unique  optimal  solution  of  P,  we  obtain  the  following  asymptotic  results  as  n,N  ^  oo. 

Theorem  2  Suppose  that  Assumptions  1,  2,  and  3  hold  and  that  ^  a  G  [0,oo],  as  n,N  ^ 

oo.  Then,  for  any  x  ^  X , 


o-VN{x)-r] 

N^^VN(x)-r] 


fix)  -  f*,  if  a  =  oo; 
inf  y(x')  +  a{f{x)  -  f*),  if  a  G  [0,oo); 


(18) 

(19) 


oo. 


as  N,  n 

Proof:  We  only  consider  (19)  as  the  other  case  follows  by  similar  arguments.  By  definition, 

N^^VNix)  -  n  (20) 

=  -  n  +  e^N^/\fNix)  -  fix))  +  e^N^/\fix)  -  n  -  e^N^/\ff,  -  r ). 

The  result  now  follows  from  Proposition  1,  the  central  limit  theorem,  and  Slutsky’s  theorem  (see, 

e.g..  Exercise  25.7  of  [5]).  □ 

Corollary  1  Suppose  that  Assumptions  1,  2,  and  3  hold  and  that 
Then,  for  any  x  G  X, 


0,  as  n,  N  oo. 


n^/^[maux))  -  r 


inf  Y (x) 
x'oX* 


(21) 


as  N,n  ^  oo. 


Proof:  The  result  follows  directly  from  (11),  Proposition  1,  and  Theorem  2.  □ 

In  view  of  Theorem  2,  we  observe  that  the  upper  bound  on  fj^{A^{x))  is  approximately 
normally  distributed  with  mean  f*  +0”(/(x)  —  /*)  and  variance  a‘^ix*)/N  for  large  n  and  N  when 
X*  =  {x*}.  Since  Vnix)  (T^(x*),  as  n  ^  oo,  we  find  that  the  last  observations  is  approximately 
equivalent  to  the  one  after  Theorem  1  when  n  is  large.  Moreover,  Corollary  1  shows  that  the 
lower  and  upper  bounds  on  fj\f{A^{x)),  and  hence  also  f]\fiA'^ix)),  have  approximately  the  same 
distribution  for  large  n  and  N  when  n  is  sufficiently  large  relative  to  N.  In  the  next  subsection,  we 
adopt  a  conservative  approach  and  use  the  upper  bounds  from  Theorems  1  and  2  to  estimate  the 
progress  of  the  algorithm  map  for  different  controls. 
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3.2  Development  of  Surrogate  Sample-Size  Control  Problem 

In  this  subsection,  we  model  the  evolution  of  the  state  using  a  surrogate  dynamic  equation 

based  on  the  previous  subsection  and  a  surrogate  state  obtained  by  aggregation.  We  note  that  be¬ 
havioral  models  of  algorithmic  progress  exist  for  local  search  algorithms  [25]  and  genetic  algorithms 
[38].  However,  these  models  appear  not  directly  applicable  here. 

Suppose  that  Algorithm  1  has  carried  out  k  —  1  stages  and  has  reached  Step  1  of  the  /c-th 
stage.  At  this  point,  we  consider  the  current  and  future  stages  I  =  k,k  +  \,k  +  2^ ...,  in  an  attempt 
to  determine  the  control  {Nk,nk)  for  the  current  stage.  We  start  by  considering  function  values 
instead  of  iterates,  which  aggregates  the  state  space  from  d  dimensions  to  one  dimension.  Theorems 
1  and  2  indicate  possible  models  for  the  evolution  of  function  values  in  Algorithm  1.  If  and  Ai^ 
are  large.  Theorem  2  states  that  conditional  on  and  given  a  unique  optimal  solution  of  P,  an 

upper  bound  on  fNf,{Xn^)  is  approximately  distributed  as 

-  n,<xHx*)/Nk).  (22) 

Moreover,  if  only  is  large.  Theorem  1  states  that  conditional  on  an  upper  bound  on 

fNk  {Xnf. )  is  approximately  distributed  as 

M{r  +  -  n.vnjNk).  (23) 

We  note,  however,  that  if  a{x*)  k,  and  Cov{F{x* ,uj),F{x^^^,uj))  k,  a{x*)(j{x^^^),  i.e., 

F{x* ,uj)  and  F{x^~^^ ,  co)  are  highly  correlated,  then  a‘^{x*)  ^  Vn^.-  Hence,  (22)  and  (23)  are  approx¬ 
imately  equal  in  distribution  when  x^^^  is  close  to  x* .  The  paragraph  after  Corollary  1  indicates 
that  (22)  and  (23)  are  also  approximately  equal  in  distribution  when  is  large.  Consequently, 
we  adopt  the  simpler  expression  (22)  as  we  conjecture  that  for  small  k,  x^~^^  is  far  from  x*  but 
an  efficient  policy  typically  involves  a  large  Uk-  (This  conjecture  is  supported  by  our  numerical 
experiments.)  On  the  other  hand,  when  k  is  large,  x^~^^  tends  to  be  close  to  x*.  Hence,  (22) 
appears  to  be  reasonably  accurate  in  the  present  context.  We  have  verified  empirically  that  (23)  is 
not  a  significantly  better  approximation  of  fN^.{Xn^)  than  (22). 

Clearly,  (22)  would  not  be  sufficient  for  estimating  /(x^^),  ®tc.,  as  knowledge  of 

/(x(j“^i)  does  not  specify  /(a;(^J,  only  fNiixi^)-  We  are  unable  to  derive  the  distribution  of  /(x^^) 
conditional  on  x^^^  and  heuristically  approximate  that  distribution  by  (22)  with  truncation  at  f* 
to  account  for  the  fundamental  relation  /(x)  >  f*  for  all  x  £  X.  Hence,  we  let 

-  n,^\^*)/Nk,n  (24) 

be  our  approximation  of  the  distribution  of  /(x^^)  conditional  on  where  M{m,v,t)  denotes 

a  truncated  normally  distributed  random  variable  with  an  underlying  normal  distribution  v) 

and  lower  truncation  thresholds  t,  i.e.,  the  cumulative  distribution  function  l>(^)  of  JV{m,v,t)  is 


11 


I>(^)  =  (^*((^  —  m)/ y/v)  —  ^{{t  —  m)/^/v))/{l  —  —  m)/ y/v)),  i>t,  where  <!>(•)  is  the  standard 

normal  cumulative  distribution  function. 

If  /*)  cr{x*)  had  been  known  at  the  beginning  of  the  fc-th  stage,  we  could  use 

(24)  to  estimate  Moreover,  we  could  use  (24)  recursively  and  estimate  /(x(jJ,  I  =  k  +  l,k  + 

2, ....  In  Section  4,  we  construct  estimation  schemes  for  /*,  6,  and  a{x*).  Since  is  known  at  the 
beginning  of  the  k-th  stage,  we  can  also  estimate  f{Xn~^-^)  by  a  sample  average  at  that  time.  Hence, 
we  proceed  with  (24)  as  the  basis  for  our  model  of  the  evolution  of  /(x(jJ,  I  =  k,k  +  l,k  +  2, ..., 
in  Algorithm  1.  Specifically,  we  define  fi,l  =  k,  k  +  1,  k  +  2, to  be  the  surrogate  state  at  the 
beginning  of  the  ^-th  stage,  which  represents  our  estimate  of  /(x(^“j^j).  We  let  Pf,p*,pg,  and  Pa-  be 
the  estimates  of  and  cr{x*),  respectively. 

Given  the  estimates  Pf,p*,pe,Pa  and  the  controls  {Nk,nk),  {Nk+i,nk+i),  {Nk+2,nk+2),  ■■■,  we 
define  the  surrogate  dynamie  equation  for  the  surrogate  state  by 

fi+i=J^{p*+PQ'{fi-p*),pl/Ni,p*),  l  =  k,k  +  l,k  +  2,...,  (25) 

with  initial  condition  fk  =  Pf.  We  note  that  the  equality  in  (25)  indicates  equality  in  distribution. 
The  terminal  states  of  SSCP  are  defined  by  X*,  which  translates  to  the  following  surrogate  terminal 
states 

T:={ie^]R\^-p*  <e].  (26) 

We  define  the  feasible  set  of  controls  R{C}  for  the  surrogate  problem  as  follows:  If  G  T,  then 
R{^)  :=  {(1,0)}.  Otherwise,  R{C)  :=  IN  x  IN. 

We  now  define  the  surrogate  sample-size  control  problem.  Given  a  stopping  tolerance  e  >  0 
and  the  estimates  Pf,p*,pe,  and  p^  at  the  beginning  of  stage  k,  we  seek  an  admissible  stationary 
policy  ^  :  IR  ^  IN  X  (IN  U  {0})  with  G  R{fi)  for  all  G  IR,  I  =  k,k  +  l,k  +  2,...,  that 

minimizes  the  total  surrogate  east  funetion 

(27) 

subject  to  the  constraints  (25).  Here,  E  denotes  expectation  with  respect  to  the  disturbances  on 
stages  k,  k  +  1,  ...,  k  +  s  given  by  the  truncated  normal  distribution  in  (25).  Then,  we  define  the 
surrogate  sample-size  eontrol  problem 

S-SSCPfc(p/,p*,p0,Po-,e)  :  ml  Jk,ki{Pf  ,P* -.Pe^PaX),  (28) 

where  the  infimum  is  over  all  admissible  policies. 

We  could  easily  restrict  R{C)  in  S-SSCP^.  For  example,  one  could  impose  minimum  values  on 
the  sample  size  and  the  number  of  iterations  at  various  stages  to  ensure  that  the  surrogate  dynamic 
equation  (25)  represents  the  algorithmic  progress  reasonably  accurately.  In  numerical  examples, 
we  effectively  impose  such  restrictions  as  part  of  the  solution  process  as  described  next. 


JkAPf,P*,Pe,PaX)  :=limsupF; 
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We  solve  S-SSCP^  approximately  by  discretizing  the  surrogate  state  space  and  the  truncated 
normal  distribution,  and  then  apply  the  backward  recursion  algorithm  to  the  resulting  dynamic 
program.  In  the  numerical  examples  below,  we  find  it  sufficient  to  consider  10  stages  into  the 
future  and  to  discretize  the  interval  \p*  +  e,pf  +  1.96po-/.A^fc-i)  using  14  points.  We  only  consider  10 
possible  values  of  ni  in  the  range  zero  to  max{10,  |'log(0.1e/(pj  —  p*))/ logpe] },  where  [a]  denotes 
the  smallest  integer  no  smaller  than  a.  We  observe  that  the  upper  end  of  the  range  of  ni  is  simply 
the  larger  of  10  and  the  number  of  iterations  required  to  reach  within  O.le  of  the  optimal  value  in 
presence  of  no  uncertainty.  We  consider  18  possible  values  of  Ni  mostly  in  the  range  [l.lA^/s-i] 
and  10A^fc_i,  but  also  with  two  larger  values  dynamically  selected. 

The  optimal  policy  found  from  solving  the  discretized  S-SSCPfc  provides  controls  {Nk,nk), 
{Nk+i,nk+i),  {Nk+2,nk+2),  etc.  However,  we  utilize  only  {N^,  Uk)  for  the  fc-th  stage  as  our  approach 
is  implemented  within  a  receding-horizon  framework  with  parameter  estimation  after  each  stage. 
We  refer  to  the  resulting  policy  as  the  S-SSCP  policy.  We  discuss  the  parameter  estimation  and 
the  full  algorithm  next. 

4  Parameter  Estimation  and  Full  Algorithm 

4.1  Parameter  Estimation 

After  completing  iterations  with  sample  size  Nk  in  stage  k  of  Algorithm  1,  the  iterates 

and  function  values  {/iVi,(a;f)}7=o  known.  We  use  these  quantities  to  update  the  parameters 

Pf,P*,Pe,  and  p^. 

We  estimate  for  stage  k  +  1  hy  simply  setting  it  equal  to  the  sample  variance  at  the 

last  iterate,  i.e., 

1  W 

pI  =  ffi+i  :=  _  -  fNkiXnu)?^  (29) 

fc  j=i 

where  uji,uj2-,  is  the  sample  used  in  stage  k. 

We  adopt  the  approach  in  [9]  to  estimate  the  rate  of  convergence  coefficient  9,  but  add  an 
exponential  smoothing  step  to  avoid  large  changes  in  the  estimate.  A  (biased)  estimate  of  f*  is 
computed  by  a  weighted  average  of  estimates  of  .  We  let  and  9k  denote  the  estimates  of  f* 
and  9  at  the  beginning  of  the  k-th.  stage,  respectively. 

Subroutine  A  (Estimates  9  and  f*  at  the  end  of  stage  k). 

Parameters.  Exponential  smoothing  parameter  cj)  G  (0, 1]  and  tolerance  >  0. 

Data.  Previous  estimates  9k  and  /^;  and  function  values  )}”=o  stage  k. 

Step  0.  Set  9  =  9k- 
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Step  1.  Estimate  minimum  value  of  Patj,: 


'n  1  f  ^ 


r=o 


I  —  Qrik-i 


(30) 


Step  2.  Solve  the  least-square  problem 


rik 

{a*,b*)  =  argmin^(log(/Ar^(a:f)  -  4)  -  iloga  -  bf. 

Step  3.  If  |0  —  a*|  <  eg,  set  9  =  a*  and  go  to  Step  4.  Else,  set  9  =  a*  and  go  to  Step  1. 
Step  4.  Set  9k+i  =  (p9  +  {1  —  4>)9k- 

Step  5.  Compute  conservative  estimate  of  minimum  value  of  Ptvj,: 


(31) 


rrik  '•=  mm 

2=0,l,...,nfc  — 1 


Step  6.  Estimate  optimal  value  of  P: 


fNkjXnJ  -  (4+1 )""  "fNkiXj) 

1  -  (4+1)"''-* 

Nk  .  . 


fk+l  ■=  .r  + 


Er=iiV/  j:f=iNi 


fk 


(32) 


(33) 


and  Stop. 


It  follows  from  Assumption  1  that  >  {fN^ix^^)  —  0"'*“*/iVfe(^f))/(l  “  4fc“®)  almost  surely 
for  any  i  =  0, 1,  ...,nk  —  1.  Step  1  in  Subroutine  A  averages  these  lower  bounding  estimates  with 
the  current  estimate  of  9  and  uses  that  as  an  estimate  of  Given  the  estimate  of  Step 
2  computes  the  rate  of  convergence  coefficient  that  best  fit  {fNk{xi)}^=o  in  a  least-square  sense. 
We  observe  that  if  {fNk{xi)}^=o  is  a  linearly  progressing  sequence,  then  a  correct  9  is  found  by 
Steps  1-3  of  Subroutine  A.  Those  steps  were  originally  proposed  in  [9]  in  the  context  of  semi-infinite 
optimization. 

Step  4  applies  exponential  smoothing  to  the  estimates  of  the  rate  of  convergence  coefficient  to 
avoid  large  fluctuations.  Step  5  computes  a  conservative  estimate  of  /](r^  that  is  merged  with  the 
current  estimate  of  f*  using  weighted  averages  in  Step  6. 

At  the  end  of  the  A:-th  stage,  we  set  pg  =  9k+i,  i.e.,  the  estimate  of  the  rate  of  convergence 
coefficient  using  iterates  up  to  stage  k.  Typically,  we  set  p*  =  but  adjust  that  if  a  condition 
is  satisfied  as  described  in  the  next  subsection. 

For  all  /c  G  IN,  we  estimate  f{x^^)  at  the  end  of  the  fc-th  stage  by  computing  the  unbiased 
estimate  fN*{Xn,.)  using  an  independent  sample  of  size  N* .  We  typically  set  pf  =  fN*{x^^)  for  use 
in  S-SSCPfc+i.  However,  we  deviate  from  that  occasionally  as  described  below. 
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In  S-SSCPfc,  the  computational  cost  function  c(-,  •)  is  assumed  to  be  known.  In  reality,  that 
may  not  be  the  case  and  we  adopt  the  following  simple  model  of  computational  cost  for  one  stage 
given  Nk,  Uk,  and  N* .  We  set 

c{Nk,  Uk)  =  wiNkUk  +  W2N*  (34) 

where  wi,  W2  >0  are  parameters  updated  after  the  {k  —  l)-th  stage  by  setting  wi  =  ti/(A^fc_infc_i) 
and  W2  =  t2/N*,  with  ti  and  t2  being  the  computing  times  to  carry  out  the  Uk-i  iterations  and  to 
compute  the  estimate  during  stage  k  —  1,  respectively.  An  alternative  polynomial  model 

of  computational  cost  based  on  linear  regression  is  used  in  [9].  However,  we  find  (34)  reasonable  in 
the  present  situation  as  the  time  required  to  calculate  /Ar(x)  for  a  given  x  is  linear  in  N.  Hence, 
the  effort  required  to  apply  the  algorithm  map  once  tends  to  be  linear  in  N. 

4.2  Full  Algorithm 

We  now  present  the  extension  of  Algorithm  1  that  includes  details  about  policy  selection  and  pa¬ 
rameter  estimation. 

Algorithm  2  (Implementable  Algorithm  for  P). 

Data.  Optimality  tolerance  e  >  0;  initial  sample  size  Nq  G  IN;  validation  sample  size  N*;  default 
sample  size  factor  yjv  >  0;  default  iteration  number  7^  G  IN;  initial  estimate  of  rate  of 
convergence  coefficient  0i;  initial  solution  Xq  G  A. 

Step  0.  Generate  an  independent  sample  of  size  Nq,  compute  /Noi^o)  and  di,  and  set  /i  =  /i  = 
fNoixo),  Pf  =  fi+  diZ/ZVo,  p*  =  ff  -  max{l,  /i},  pe  =  §1,  p^  =  ai,  k  =  1,  and  xj  =  xg. 

Step  1.  Determine  Uk  and  Nk  by  solving  S-SSCPk{Pf,P*,Pe,PcTN)- 

Step  2.  Generate  an  independent  sample  of  size  Nk- 

Step  3.  For  i  =  0  to  Ufc  —  1:  Gompute  xf_^^  =  An^,{x^)  using  the  sample  from  Step  2. 

Step  4.  Gompute  dfc+i,  Ok+i,  and  by  (29)  and  Subroutine  A. 

Step  5.  Generate  an  independent  sample  of  size  N*  and  compute  /tv*  {x^^  ) . 

Step  6.  If  -h  e  <  fN*{XnJ,  set  pf  =  /Ar*(x^J,  p*  =  pg  =  4+i,  and  p^  =  dfc+i.  Replace 
k  hy  k  +  1  and  go  to  Step  1. 

Elseif  -  a-k+i/\jT,i=i^i  +  e  <  fN*{XnJ  +  ^k+i/V^,  then  set  pf  =  fN*{XnJ  + 
(Tfc+i/\/iV*,  p*  =  /Z+i  -  ^k+i/\jT,i=i  Ni,  pg  =  4+1,  and  p^  =  dfc+i.  Replace  k  by  k  +  1  and 

go  to  Step  1. 

Else  set  Nk+i  =  [yTvAfc]  and  Uk+i  =  7n-  Replace  k  by  k  +  1  and  go  to  Step  2. 
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Step  0  of  Algorithm  2  computes  a  rudimentary  estimate  of  f*.  If  problem  specific  information  is 
available,  the  initial  estimate  of  f*  may  be  improved.  We  note  that  if  +  e  >  fN*{x^^),  then 
S-SSCPfc_|_i(/Ar*(x^^), e)  returns  the  policy  A^fc+i  =  1  and  rik+i  =  0  since  a  surrogate 
terminal  state  is  already  reached.  This  may  occur  if  and/or  /tv*  )  are  inaccurately  estimates 
of  f*  and  /(x^^),  respectively.  In  that  case,  Step  6  of  Algorithm  2  attempts  to  use  conservative 
estimates  of  f*  and  /(x^^).  If  the  conservative  estimates  do  not  immediately  lead  to  a  surrogate 
terminal  state,  they  are  adopted  in  S-SSCPk+i{Pf,P*,Pe,Pa,€)-  If  the  conservative  estimates  still 
immediately  lead  to  a  surrogate  terminal  state,  is  probably  (close  to)  a  near-optimal  solution 
and  Algorithm  2  resorts  to  a  default  policy. 

Even  though  S-SSCPfc  aims  to  obtain  a  near-optimal  solution,  such  a  solution  is  not  guar¬ 
anteed  due  to  inaccurate  parameter  estimates  and  other  approximations.  However,  Algorithm  2 
is  essentially  identical  to  Algorithm  1  except  it  includes  a  specific  policy  for  selecting  and  n^. 
Hence,  under  Assumption  1  and  the  adaption  of  a  uniform  law  of  large  numbers  (Theorem  7.48  in 
[41]),  it  follows  that  given  a  particular  stage  k,  there  exists  an  N'  and  n!  such  that  if  Nk  >  N'  and 
Hk  >  n',  then  x^^  generated  by  Algorithm  2  is  contained  in  X*.  This  fact  gives  assurance  that  a 
near-optimal  solution  can  be  obtained  for  large  rik  and  N^. 

In  practice,  all  calculations  must  be  terminated  after  a  finite  time  period,  which  raises  the 
question  of  how  to  estimate  the  proximity  to  optimality  for  the  obtained  solutions.  We  discuss  that 
topic  next. 


4.3  Validation  Analysis 

The  proximity  to  optimality  of  a  solution  x^^  obtained  by  Algorithm  2  could  be  estimated  using  an 
optimality  function  [32]  or  a  hypothesis  test  of  Karush-Kuhn- Tucker  conditions  [42] .  We  develop 
a  third  approach  motivated  by  the  optimality  gap  estimates  of  [24,  20].  As  we  see  below,  this 
approach  utilizes  quantities  already  computed  in  Algorithm  2. 

Step  5  of  Algorithm  2  computes  in  the  k-th  stage  fN*{x^^).  Under  Assumption  2,  the  central 
limit  theorem  gives  that  fN*{Xn^)  is  approximately  normally  distributed  with  mean  /(x^^)  and 
variance  for  large  N*.  Hence,  fN*{Xn^)  is  a  probabilistic  upper  bound  on  f*.  This 

upper  bound  is  identical  to  those  used  in  [24,  20]. 

We  depart  from  [24,  20]  when  constructing  a  probabilistic  lower  bound  on  f*.  If  Assumption 
1  holds  and  Subroutine  A  has  constructed  such  that  1  >  9i  >  0  for  all 

I  =  2,3, k  +  1,  then  mi  <  almost  surely  for  all  I  =  1, 2, 3, ...,  A:.  Consequently,  it  follows  from 
the  definition  of  that 


Jk+1 


k 

^  Nirhi  < 


1=1 


2^j=i  1=1 


a.s. 


(35) 


Since  E[//f]  <  f*  for  all  A  G  IN,  see,  e.g.,  [20],  it  follows  that  E[/^_i_^]  <  f*.  Consequently, 
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is  a  lower  bound  on  /*,  on  average.  In  the  case  of  a  nonconvex  problem,  Assumption  1  typically 
only  holds  on  subsets  of  X  near  locally  optimal  solutions  with  replaced  by  the  locally  optimal 
values.  Hence,  in  practice  may  only  be  a  probabilistic  lower  bound  on  a  locally  optimal  value 
in  the  case  of  a  nonconvex  problem. 

We  now  consider  the  asymptotic  distribution  of  the  lower  bound  on  /^. 

Theorem  3  Suppose  that  Assumptions  1,  2,  and  3  hold  and  that  ^0,  as  n,N  oo. 

Then,  for  any  x  G  X, 


as  N,n  ^  oo. 


^1/2  (  fN{A%{x))  -e^fNjx) 

\  1  -  6»” 


inf  Y(x') 
x'ex* 


(36) 


Proof:  Since  is  suboptimal,  it  follows  that 

fN{AUx))-0^fN{x) 


fN> 


1  - 


Hence, 


(37) 


> 

> 


-  n 

N^/\fN{A%{x))  -  n  -  e^N^/HfN{x)  -  r 

i-e^ 

-  D  -  rivV2(/^(^)  _ 


l-gn 


a.s. 


(38) 


The  last  expression  tends  to  m.ixi^x*Y{x')  as  n,N  oo  with  ^  0  by  Proposition  1, 

the  central  limit  theorem,  and  Slutsky’s  theorem  (see,  e.g..  Exercise  25.7  of  [5]).  This  result  and 
Proposition  1  prove  the  theorem.  □ 

When  the  sequence  {/Arj,(a:f)}”=o  linearly  progressing  with  rate  of  progression  dk+i,  then  the 
minimization  in  (32)  can  be  ignored.  In  this  case,  we  see  from  Theorem  3  that  rhk  is  approximately 
normally  distributed  with  mean  f*  and  variance  a‘^{x*)/Ni~  when  Ni~  is  large,  is  large  relatively 
to  Nk,  and  X*  =  {x*}.  In  principle,  a  central  limit  theorem  for  triangular  arrays  could  provide 
the  asymptotic  distribution  of  as  k  ^  oo.  However,  we  find  such  a  result  less  useful  as  the 

number  of  stages  before  Algorithm  2  finds  a  near-optimal  solution  is  typically  rather  small  (e.g., 
5-15).  Hence,  we  proceed  with  a  finite  k  and  simply  observe  that  if  rhi  is  approximately  normal  for 
all  stages  ^  =  1,  2, ...,  k,  then  is  approximately  normally  distributed  with  mean  f*  and  variance 
a‘^{x*)/  J2i=i  ^i-  We  note  that  since  the  asymptotic  distribution  in  (36)  is  independent  of  x,  the 
dependence  between  the  stages  induced  by  the  warm  starting  Xg  =  is  insignificant  for  large 

number  of  iterations. 

Combining  the  upper  and  lower  bounds  on  f*  we  obtain  that  P’[/(a:^^)  —  /*  <  e]  is  approxi¬ 
mately  bounded  below  by 


-  fN*{xi^) 

+  ^k+l 


/N 


(39) 
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where  a'^{x*)  is  estimated  by  Hence,  if  (39)  is  large,  then  it  is  likely  that  G  X*.  The  last 
solution  of  each  stage  in  Algorithm  2  can  be  validated  in  this  manner. 

5  Computational  Studies 

There  are  obviously  many  alternatives  to  the  sample-size  policy  induced  by  S-SSCP^  in  Step  1 
of  Algorithm  2.  In  this  section  we  examine  some  that  we  believe  are  practical  alternatives  to  the 
S-SSCP  policy  including  the  asymptotically  optimal  policy  of  the  recent  paper  [26].  Specifically, 
we  compare  the  computing  time  required  to  obtain  a  near-optimal  solution  by  Algorithm  2  using 
different  sample-size  selection  policies  in  Step  1.  As  mentioned  in  Section  1,  stochastic  programs 
may  also  be  solved  by  algorithms  not  based  on  SAA.  However,  in  this  paper  we  do  not  compare 
across  algorithmic  frameworks  and  focus  on  efficient  sample-size  selection  within  SAA  when  applied 
to  smooth  stochastic  programs. 

We  implement  Algorithm  2  with  Subroutine  A  in  Matlab  Version  7.4  on  a  laptop  computer 
with  2.16  GHz  processor,  2  GB  RAM,  and  Windows  XP  operating  system.  We  use  one  iteration  of 
the  projected  gradient  method  with  Armijo  step  size  rule  (see,  e.g.,  p.  67  of  [28])  as  the  algorithm 
map  A]s[{-).  The  quadratic  direction  finding  problem  in  the  projected  gradient  method  is  solved 
using  LSSOL  [8]  as  implemented  in  TOMLAB  7.0  [11]. 

In  all  computational  tests,  we  use  parameters  a  =  0.5  and  (3  =  0.8  in  Armijo  step  size  rule 
(see  p.  67  of  [28])  as  well  as  exponential  smoothing  parameter  cf)  =  l/2>  and  tolerance  eg  =  0.0001 
in  Subroutine  A.  We  use  initial  sample  size  Nq  =  1000,  default  sample  size  factor  yjv  =  1-1,  default 
iteration  number  7„  =  3,  and  initial  estimate  of  rate  of  convergence  coefficient  9i  =  0.9.  Our  initial 
estimates  of  wi  and  W2,  see  (34),  are  3  and  1,  respectively. 

5.1  Numerical  Examples 

We  consider  the  following  five  of  problem  instances  for  our  numerical  tests.  The  first  three  in¬ 
stances  are  constructed  examples  of  P  with  known  optimal  solutions  that  allow  us  to  examine 
different  degree  of  ill-conditioning.  The  fourth  problem  instance  arises  in  military  and  civilian 
search  and  rescue  operations.  The  fifth  instance  arises  in  engineering  design  with  multiple  per¬ 
formance  functions.  The  last  problem  instance  illustrates  that  Algorithm  2  may  be  used  even  if 
F(-,uj)  is  nonsmooth. 

5.1.1  Problem  Instances  QUADl,  QUAD2,  and  QUADS 

Problem  instances  QUADl,  QUAD2,  and  QUAD3  are  defined  in  terms  of 

20 

F{x,  w)  =  ^  ai{xi  -  hitOif  (40) 

i=\ 
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with  =  21  —  i,  i  =  1,2,. ..,20,  and  oo  =  (wi,  0^2,  •••,  <^20)^  being  a  vector  of  20  independent 
and  [0,  l]-uniformly  distributed  random  variables.  We  refer  to  the  problem  instance  with  a,  =  i, 
i  =  1,2,...,  20,  by  QUADl,  the  problem  instance  with  a*  =  1  +  199(i  —  1)/19,  i  =  1,2,...,  20, 
by  QUAD2,  and  a*  =  1  +  1999(i  —  1)/19,  i  =  1,2,. ..,20,  by  QUADS.  All  these  instances  are 
unconstrained  and  we  set  X  equal  to  a  sufficiently  large  convex  compact  subset  of  1R^°  that  includes 
all  relevant  solutions.  Obviously,  QUADl,  QUAD2,  and  QUADS  are  strictly  convex  with  identical 
unique  global  minimizer  x*  =  {xl, ...,  a;2o)^  where  x*  =  bij2.  The  optimal  value  is  aibfj  12.  We 

note  that  QUADl,  QUAD2,  and  QUADS  are  increasingly  ill-conditioned.  Even  though  solvable 
without  SAA,  we  use  these  simple  problem  instances  to  illustrate  our  approach.  For  QUADl, 
QUAD2,  and  QUADS  we  set  Xq  =  0  G  1R^°  and  use  relative  optimality  tolerance  0.001,  i.e., 
e  =  O.OOlp*  in  Algorithm  2. 

5.1.2  Problem  Instance  SEARCH 

The  next  problem  instance  generalizes  a  classical  problem  arising  in  search  and  detection  applica¬ 
tions.  Consider  an  area  of  interest  divided  into  d  cells.  A  stationary  target  is  located  in  one  of  the 
cells.  A  priori  information  gives  that  the  probability  that  the  target  is  in  cell  i  is  pi,  i  =  1,2, 
with  Pi  =  1.  The  goal  is  to  optimally  allocate  one  time  unit  of  search  effort  such  that  the  prob¬ 
ability  of  not  detecting  the  target  is  minimized  (see,  e.g.,  p.  5-1  in  [46]).  We  generalize  this  problem 
and  consider  a  random  search  effectiveness  in  cell  i  per  time  unit  and  minimize  the  expected  proba¬ 
bility  of  not  detecting  the  target.  We  let  x  =  {xi,X2, ...,  G  IR'^,  with  Xi  representing  the  number 
of  time  units  allocated  to  cell  i,  and  let  uj  =  (wi,  0^2,  •••,  <^d)^  with  tOi,  i  =  l,2,...,d,  being  inde¬ 
pendent  lognormally  distributed  random  variables  (with  parameters  =  lOOui  and  A*  =  0,  where 
Ui  G  (0, 1)  are  given  data  generated  by  independent  sampling  from  a  uniform  distribution)  repre¬ 
senting  the  random  search  effectiveness  in  cell  i.  Then,  the  expected  probability  of  not  detecting 
the  target  is  f{x)  =  lE[F(x,ti;)],  where 

d 

F{x,uj)  =  Y,Pie~"'"'-  (41) 

i=l 

The  decision  variables  are  constrained  by  =  1  Xi  >  0,  i  =  l,2,...,d.  We  consider 

d  =  100  cells.  This  problem  instance,  referred  to  as  SEARCH,  is  convex.  We  observe  that  the 
expectation  in  the  objective  function  can  be  computed  by  (numerically)  solving  d  one-dimensional 
integrals.  However,  our  goal  is  to  illustrate  Algorithm  2,  which  is  based  on  SAA,  so  we  do  not 
pursue  that  avenue.  For  this  problem  instance,  we  use  Xg  =  (1/100, ...,  1/100)'  G  1R^'^°  and  use 
relative  optimality  tolerance  0.001,  i.e.,  e  =  O.OOlp*  in  Algorithm  2. 

5.1.3  Problem  Instance  TRUSS 

The  last  problem  instance  deals  with  the  design  of  a  truss  structure  with  topology  given  in  Figure 
1.  The  truss  is  subject  to  a  random  load  L  in  its  mid-span.  L  is  lognormally  distributed  with 
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Figure  1:  Design  of  Truss 

mean  100  kN  and  standard  deviation  10  kN.  Let  Si  be  the  yield  stress  of  member  i.  Members 
1-7  have  lognormally  distributed  yield  stresses  with  means  100,  120,  180,  190,  200,  210,  and  220 
N/mm^,  respectively.  Members  1  and  2  have  standard  deviation  5  N/mm^  and  members  3-7 
have  standard  deviations  10  N/mm^.  The  yield  stresses  of  members  1  and  2  are  correlated  with 
correlation  coefficients  0.8.  However,  their  correlation  coefficients  with  the  other  yield  stresses  are 
0.5.  Similarly,  the  yield  stresses  of  members  3-7  are  correlated  with  correlation  coefficients  0.8, 
but  their  correlation  coefficients  with  the  yield  stresses  of  members  1  and  2  are  0.5.  The  load  L  is 
independent  of  the  yield  stresses. 

The  design  vector  x  =  (xi,X2,  ...,X7)'  G  IR^,  where  Xi  is  the  cross-section  area  (in  1000  mm^) 
of  member  i.  The  truss  fails  if  any  of  the  members  exceed  their  yield  stress  and,  hence,  the 
probability  of  failure  is  —  L/Q  <  0}],  where  Ci  =  l/(2\/3)  for  i  =  1,2,  and  Ci  =  1/^3 

for  i  =  3,4,  ...,7  (see  [36]  for  details).  Using  the  approach  in  [36],  see  also  [34],  we  find  that  this 
probability  of  failure  can  be  approximated  with  high  accuracy  by 

f{x)  =  lE[max{p,  max  {1  -  x^(rf  (x,  w))}}  (42) 

2=1,...,7 

where  p  >  0  is  an  approximation  parameter  set  equal  to  20,  xiC')  Chi-square  cumulative  distri¬ 
bution  function  with  8  degrees  of  freedom,  uj  is  an  eight-dimensional  random  vector  of  independent 
standard  normal  random  variables  obtained  from  the  original  random  variables  (L,  Si, ...,  S7)  using 
a  Nataf  probability  transformation,  and  ri{-,Lo)  is  a  smooth  distance  function.  The  function  (42) 
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is  of  form  (1)  and  is  continuously  differentiable  under  moderate  assumptions  [34].  However,  the 
expression  inside  the  brackets  in  (42)  is  not  continuously  differentiable  everywhere  for  IP-almost 
every  cu  G  H.  We  overcome  this  difficulty  by  smoothing  that  expression  using  exponential  smooth¬ 
ing  with  smoothing  parameter  10^;  see  [30].  This  results  in  an  error  in  function  evaluation  due  to 
smoothing  of  less  than  2  •  10“^  for  all  x  G  IR^  and  w  G  H.  A  similar  smoothing  approach  is  used 
in  [1,  47]  for  Conditional  Value-at-risk  minimization.  This  problem  instance  illustrates  that  also 
nonsmooth  problem  instances  may  be  solved  approximately  by  Algorithm  2. 

The  goal  in  this  truss  design  problem,  denoted  TRUSS,  is  to  minimize  f{x)  subject  to  X]J=i  = 
3,  Xi  <  0.5,  Xi  >  0.2,  i  =  1,  2, ...,  7.  We  note  that  this  problem  instance  is  not  known  to  be  convex. 
We  use  Xq  =  (3/7,  ...,3/7)'  G  IR^  and  e  =  0.05p*.  In  this  problem  instance,  we  increase  the  relative 
tolerance  as  the  variability  is  rather  high  {a{x*)/ f*  ~  10). 

5.2  Computational  Results 

We  apply  Algorithm  2  with  different  sample-size  selection  policies  to  the  five  problem  instances. 
The  measure  of  performance  of  the  policies  is  the  time  required  in  Algorithm  2  until  (39)  exceeds 
the  threshold  0.95.  We  do  not  carry  out  a  rigorous  analysis  of  such  sequential  tests,  but  find  em¬ 
pirically  that  the  coverage  probabilities  for  Algorithm  2  with  this  stopping  criterion  is  satisfactory. 
Specifically,  Algorithm  2  stops  after  a  stage  k  with  an  that  fails  to  satisfy  /(x^^)  —  /*  <  e  in 
only  1%  of  320  independent  runs  on  QUADl,  which  is  well  within  the  5%  indicated  by  the  threshold 
0.95.  The  stopping  criterion  yields  satisfactory  coverage  also  on  the  other  problem  instances  and 
suffices  for  our  purpose  of  comparing  different  policies.  In  view  of  this  criterion,  we  select  N*  so 
that  the  variability  in  fN*{x/^^)  would  tend  not  to  prevent  (39)  from  exceeding  0.95.  That  is,  we  set 
N*  =  [((Ti4>“^(0.95)/(e/2))^],  which  is  the  smallest  sample  size  that  ensures  that  (39)  equals  0.95 
when  =  e/2  and  there  is  no  uncertainty  in  the  lower  bound,  i.e.,  “equals” 

infinity. 

Table  1  gives  average  computing  times  in  seconds  over  10  independent  runs  of  Algorithm 

2,  with  standard  deviations,  when  applied  to  QUADl  (columns  4-5),  QUAD2  (columns  6-7),  and 
QUAD3  (columns  8-9).  Each  row  represents  a  particular  policy  for  determining  (Nk,nk).  The  third 
row  gives  the  times  when  (V^,  rij/)  is  determined  by  the  S-SSCP  policy,  rows  4-6  give  times  for  an 
“additive  policy”  where  Ni  =  A'*/1000  and  =  A^i  -|-  {N*  —  A'i)A:/20,  k  =  2,3, ...,  with  Uk  =  5, 10, 
and  20,  respectively,  rows  7-9  give  times  for  a  “multiplicative  policy”  where  Ni  =  V*/1000  and 
Nk  =  1.5^“^Vi,  k  =  2,3, ...,  with  =  5, 10,  and  20,  respectively,  and  rows  10-12  give  times  for  the 
same  multiplicative  policy  as  the  previous  rows  except  that  =  2^~^Ni,  k  =  2,3, ....  Rows  13-24 
follow  policies  deduced  from  the  recommendation  in  [26] .  Specifically,  from  an  Ni  given  in  column 

3,  Nk  =  1.1^“^ Vi,  k  =  2,3,...,  for  rows  13-18  and  =  1.5^“^Ali,  k  =  2,3,...,  for  rows  19-24. 
The  number  of  iterations  carried  out  at  each  stage  is  determined  adaptively:  the  stage  ends  when 
||(V^/Ar^(xf))“^V/Ar^(x/)ll  <  K/y/N/.  Column  2  of  rows  13-24  gives  the  K  used.  The  policies  of 
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Name 

Policy 

nk 

Ni 

QUADl 
avg.  st.dev. 

QUAD2 
avg.  st.dev. 

QUAD3 
avg.  st.dev. 

S-SSCP 

adaptive 

adaptive 

182 

98 

240 

no 

223 

98 

Additive 

5 

NVlOOO 

242 

59 

412 

84 

564 

200 

Additive 

10 

N7IOOO 

434 

213 

394 

111 

611 

343 

Additive 

20 

N7IOOO 

352 

172 

365 

184 

662 

485 

Mult.  1.5 

5 

N7IOOO 

631 

267 

1351 

1368 

962 

514 

Mult.  1.5 

10 

N7IOOO 

666 

379 

480 

219 

589 

220 

Mult.  1.5 

20 

N7IOOO 

759 

460 

514 

220 

889 

609 

Mult.  2 

5 

N7IOOO 

494 

233 

3958 

3871 

997 

414 

Mult.  2 

10 

N7IOOO 

867 

426 

1112 

879 

1192 

814 

Mult.  2 

20 

N7IOOO 

1080 

1310 

537 

221 

948 

480 

Mult.  1.1 

K  =  1 

10 

1434 

170 

2522 

[4] 

- 

[0] 

Mult.  1.1 

K  =  1 

100 

1039 

229 

2514 

[9] 

- 

[0] 

Mult.  1.1 

K  =  1 

N7IOOO 

501 

200 

1185 

746 

976 

[1] 

Mult.  1.1 

K  =  0.1 

10 

1521 

318 

2524 

[3] 

- 

[0] 

Mult.  1.1 

K  =  0.1 

100 

1015 

323 

1567 

[3] 

2963 

[1] 

Mult.  1.1 

K  =  0.1 

N7IOOO 

974 

391 

1301 

[6] 

2502 

[3] 

Mult.  1.5 

K  =  1 

10 

591 

245 

1690 

[9] 

3349 

[1] 

Mult.  1.5 

K  =  1 

100 

419 

151 

1187 

[8] 

1836 

[2] 

Mult.  1.5 

K  =  1 

N7IOOO 

305 

79 

1094 

[8] 

2521 

[4] 

Mult.  1.5 

K  =  0.1 

10 

864 

427 

2076 

[6] 

1143 

[1] 

Mult.  1.5 

K  =  0.1 

100 

841 

778 

1794 

[6] 

1186 

[3] 

Mult.  1.5 

K  =  0.1 

N71000 

573 

514 

1925 

[4] 

2221 

[2] 

Table  1:  Average  and  standard  deviation  of  computing  times  (seconds)  over  ten  runs  of  Algorithm 
2  excluding  the  time  of  Step  1  when  applied  to  QUADl,  QUAD2,  and  QUADS.  Averages  are  only 
over  runs  completed  within  3600  seconds.  If  less  than  10  runs  finished  within  that  time  limit,  we 
report  the  number  that  did  finish  in  brackets. 


rows  13-24  are  asymptotically  optimal  in  a  sense  defined  in  [26].  We  note  that  A^*/1000  is  typically 
around  600  for  QUADl,  QUAD2,  and  QUADS,  respectively. 

In  all  cases,  is  set  to  3,000,000,  if  the  above  policies  propose  a  sample  size  of  larger  than 
3,000,000.  We  anticipate  that  in  real-world  application  of  the  proposed  methodology  the  computing 
time  required  to  approximately  solve  the  surrogate  sample-size  control  problem  (Step  1  of  Algorithm 
2)  will  be  negligible  compared  to  the  computing  time  of  Step  3  in  Algorithm  2.  Hence,  we  exclude 
the  time  of  Step  1  in  the  computing  times  reported  in  Table  1.  For  the  present  problem  instances, 
the  time  of  Step  1  is  typically  around  2.5  seconds  and  the  number  of  stages  is  typically  between  5 
and  15. 

We  see  from  Table  1  that  the  S-SSCP  policy  is  significantly  better  than  the  alternative  ones 
for  all  there  problem  instances.  The  first  additive  policy  (see  row  4  of  Table  1)  appears  to  be 
reasonably  efficient,  but  it  requires  roughly  twice  the  computing  time  of  the  S-SSCP  policy,  on 
average.  Other  alternative  policies  may  require  as  much  as  an  order  of  magnitude  more  computing 
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Name 

Policy 

nk 

w 

SEARCH 
avg.  st.dev. 

TRUSS 
avg.  st.dev. 

S-SSCP 

adaptive 

adaptive 

121 

54 

1593 

512 

Additive 

5 

A^VlOOO 

134 

74 

1992 

2240 

Additive 

10 

W/IOOO 

253 

158 

2173 

1369 

Additive 

20 

A^VlOOO 

210 

120 

6875 

6280 

Mult.  1.5 

5 

A^VlOOO 

173 

69 

2661 

2531 

Mult.  1.5 

10 

A^VlOOO 

204 

145 

2751 

1181 

Mult.  1.5 

20 

A^VlOOO 

443 

392 

6756 

7507 

Mult.  2 

5 

ivviooo 

122 

67 

3303 

2805 

Mult.  2 

10 

ivviooo 

340 

397 

4557 

4018 

Mult.  2 

20 

A^VlOOO 

404 

389 

10056 

8098 

Pixed 

5 

N*/2 

560 

171 

- 

[0] 

Eixed 

10 

N*/2 

942 

353 

- 

[0] 

Eixed 

25 

N*/2 

1138 

438 

- 

[0] 

Table  2:  Average  and  standard  deviation  of  computing  times  (seconds)  over  ten  runs  of  Algorithm 
2  excluding  the  time  of  Step  1  when  applied  to  SEARCH  and  TRUSS.  Averages  are  only  over  runs 
completed  within  15000  seconds.  If  less  than  10  runs  finished  within  that  time  limit,  we  report  the 
number  that  did  finish  in  brackets. 


time.  The  alternative  policies  deduced  from  the  recommendation  in  [26]  (see  rows  13-24  in  Table 
1)  perform  comparable  to  the  other  alternative  policies  on  QUADl  (but  still  much  worse  than  the 
S-SSCP  policy).  However,  they  result  in  extremely  poor  computing  times  for  QUAD2  and  QUADS 
with  many  runs  not  completed  within  one  hour.  It  appears  that  the  adaptive  rule  for  determining 
the  number  of  iterations  for  each  stage  in  these  policies  tends  to  result  in  “over-solving”  each  stage. 
We  also  examined  a  fixed  policy  with  Nf^  =  N* /2  and  =  5  for  all  k  on  QUADl  (not  reported  in 
Table  1),  but  computing  times  exceeded  15000  seconds  for  all  10  runs. 

In  view  of  the  results  of  Table  1,  we  see  that  even  on  simple  problem  instances  a  poor  choice 
of  sample-size  selection  policy  may  result  in  extremely  long  computing  times.  Moreover,  the  rec¬ 
ommendations  from  [26] ,  which  are  based  on  asymptotic  analysis  of  sampling  error  and  algorithmic 
progress,  may  not  be  helpful  in  practice.  In  fact,  on  the  problem  instances  examined,  these  rec¬ 
ommendations  perform  worse  than  simple  additive  or  multiplicative  polices.  On  the  other  hand, 
the  S-SSCP  policy  appears  to  be  robust  and  it  performs  well  even  on  ill-conditioned  problems.  In 
contrast  to  rigid  additive  and  multiplicative  polices,  S-SSCP  initially  recommends  many  iterations 
per  stage  but  reduces  the  number  as  the  sample  size  is  increased.  When  the  sample  size  is  large  and 
the  surrogate  terminal  state  is  almost  satisfied,  a  cautious  increase  in  sample  size  is  recommended. 

Table  2  gives  similar  result  as  Table  1  but  for  problem  instances  SEARCH  and  TRUSS.  Each 
row  represents  a  particular  policy  as  described  in  Table  1  with  the  addition  of  three  alternative 
policies  using  a  fixed  sample  size  =  N* /2  and  Uk  =  5,  10,  or  20  for  all  k,  see  rows  13-15  of  Table 
2.  We  do  not  examine  the  policies  from  [26]  due  to  their  poor  performance  in  Table  1.  On  SEARCH, 
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the  S-SSCP  policy  appears  to  be  the  fastest,  but  with  averages  over  only  10  runs  we  cannot  claim 
that  the  advantage  over  the  best  alternative  policy  is  statistically  significant.  We  notice,  however, 
that  the  alternative  policies  often  have  significantly  larger  standard  deviations  than  that  associated 
with  the  S-SSCP  policy.  Consequently,  the  user  of  an  alternative  policy  is  exposed  to  a  significant 
risk  of  having  a  long  computing  time  even  with  a  “good”  choice  of  alternative  policy. 

On  the  problem  instance  TRUSS  (see  columns  6  and  7  of  Table  2)  the  S-SSCP  policy  again 
outperforms  the  alternative  policies  with  substantial  margins.  In  this  case,  we  also  see  that  the 
alternative  policies  have  significantly  larger  standard  deviations  (coefficient  of  variations  of  roughly 
1)  than  that  of  the  S-SSCP  policy  (coefficient  of  variation  of  roughly  1/3).  Hence,  the  user  of 
an  alternative  policy  not  only  should  expect  a  longer  computing  time  on  average  as  compared 
to  the  S-SSCP  policy,  but  also  the  possibility  of  much  longer  times.  We  observe  that  the  best 
alternative  policy  for  SEARCH  (see  row  10  of  Table  2)  is  only  the  fifth  best  alternative  policy  for 
TRUSS.  Hence,  as  could  be  expected,  a  good  alternative  policy  for  one  problem  instance  may  not 
be  particularly  good  for  another.  Of  course,  this  makes  the  process  of  selecting  a  policy  manually 
or  by  trial- and-error  rather  difficult. 

6  Conclusions 

We  consider  the  solution  of  smooth  stochastic  programs  by  sample  average  approximations  and 
formulate  the  problem  of  selecting  efficient  sample  sizes  as  a  discrete-time  optimal-control  problem 
that  aims  to  minimize  the  expected  computing  time  to  reach  a  near-optimal  solution.  The  optimal- 
control  problem  is  intractable,  but  we  approximate  it  by  a  surrogate  sample-size  control  problem 
using  state  aggregation  and  the  result  of  a  novel  model  of  algorithmic  behavior.  The  surrogate 
sample-size  control  problem  depends  on  unknown  parameters  that  we  estimate  as  the  algorithm 
progresses.  Hence,  we  approximately  solve  the  control  problem  repeatedly  within  a  receding-horizon 
framework. 

Even  with  estimates  of  parameters,  the  surrogate  sample- size  control  problem  provides  a  policy 
for  selecting  sample  sizes  and  number  of  iterations  that  outperforms  plausible  alternative  policies 
including  policies  known  to  be  optimal  in  some  asymptotic  sense.  The  surrogate  sample-size  control 
problem  provides  a  policy  that  appears  to  be  robust  to  changing  characteristics  of  problem  instances 
such  as  ill-conditioning.  In  comparison,  the  alternative  policies  may  result  in  dramatically  varying 
computing  times.  Of  course,  we  do  not  examine  all  possible  policies  in  this  paper,  among  which 
there  is  likely  to  be  some  that  are  better  than  the  surrogate  sample-size  control  policy.  However, 
we  illustrate  the  difficulty  a  user  faces  when  selecting  a  policy  prior  to  calculations.  We  also  show 
that  guidance  provided  by  recommendations  in  the  literature  may  not  be  helpful  in  practice.  The 
approach  derived  in  this  paper  eliminates  the  need  for  users  to  select  a  policy  through  extensive  trial- 
and-error  or  guesswork  and,  hence,  facilitates  implementation  of  stochastic  programming  algorithms 
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in  decision-support  tools. 
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